Automating Construction of Machine Learning Models with Clinical Big Data: Rationale and Methods
نویسندگان
چکیده
Background: To improve health outcomes and cut healthcare costs, we often need to conduct prediction/classification using large clinical data sets, a.k.a. “clinical big data,” e.g., to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, healthcare researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Healthcare researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a U.S. shortage of data scientists and hiring competition from companies with deep pockets, healthcare systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select: a) hyper-parameter values and complex algorithms that greatly affect model accuracy, as well as b) operators and periods for temporally aggregating clinical attributes (e.g., whether a patient’s weight kept rising in the past year). This process becomes infeasible with limited budgets. Objective: This study’s goal is to enable healthcare researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. Methods: This study will: 1) finish developing new software Auto-ML (Automated Machine Learning) to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance, 2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers, and 3) perform simulations to estimate the impact of adopting Auto-ML on U.S. patient outcomes. Results: We are currently writing Auto-ML’s design document. We intend to finish our study in around five years. Conclusions: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, healthcare researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in healthcare and improve patient outcomes.
منابع مشابه
Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
BACKGROUND To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many c...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملPredicT-ML: a tool for automating machine learning model building with big clinical data
BACKGROUND Predictive modeling is fundamental to transforming large clinical data sets, or "big clinical data," into actionable knowledge for various healthcare applications. Machine learning is a major predictive modeling approach, but two barriers make its use in healthcare challenging. First, a machine learning tool user must choose an algorithm and assign one or more model parameters called...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملAutomated Machine Learning on Big Data using Stochastic Algorithm Tuning
We introduce a means of automating machine learning (ML) for big data tasks, by performing scalable stochastic Bayesian optimisation of ML algorithm parameters and hyper-parameters. More often than not, the critical tuning of ML algorithm parameters has relied on domain expertise from experts, along with laborious handtuning, brute search or lengthy sampling runs. Against this background, Bayes...
متن کامل